Developing a Phonemic and Syllabic Frequency Inventory for Spontaneous Spoken Castilian Spanish and their Comparison to Text-Based Inventories
نویسندگان
چکیده
In this paper we present our recent work to develop phonemic and syllabic inventories for Castilian Spanish based on the C-ORAL-ROM corpus, a spontaneous spoken Spanish with varying degrees of naturalness and in different communicative contexts. These inventories have been developed by means of a phonemic and syllabic automatic transcriptor whose output has been assessed by manually reviewing most of the transcriptions. The inventories include absolute frequencies of occurrence of the different phones and syllables. These frequencies have been contrasted against an inventory extracted from a comparable textual corpus, finding evidence that the available inventories, based mainly on text, do not provide an accurate description of spontaneously spoken Castilian Spanish.
منابع مشابه
How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies
The growing availability of spoken language corpora presents new opportunities for enriching the methodologies of speech and language therapy. In this paper, we present a novel approach for constructing speech motor exercises, based on linguistic knowledge extracted from spoken language corpora. In our study with the Dutch Spoken Corpus, syllabic inventories were obtained by means of automatic ...
متن کاملSpeech Characteristics and Intelligibility in Adults with Mild and Moderate Intellectual Disabilities.
PURPOSE Adults with intellectual disabilities (ID) often show reduced speech intelligibility, which affects their social interaction skills. This study aims to establish the main predictors of this reduced intelligibility in order to ultimately optimise management. METHOD Spontaneous speech and picture naming tasks were recorded in 36 adults with mild or moderate ID. Twenty-five naïve listene...
متن کاملMulti-level rhythm control for speech synthesis using hybrid data driven and rule-based approaches
This paper presents: a multi-level concept to generate the speech rhythm in the Dresden TTS system for German (DreSS). The rhythm control includes the phrase, the syllabic and the phonemic level. The concept allows the alternative use of rulebased or statistical, but also data driven methods on these levels. To create the rules and to train a neural network, a new speech corpus from original sp...
متن کاملSpanish broadcast news transcription
We describe the Sail Labs Media Mining System (MMS) aimed at the transcription of Castilian Spanish broadcastnews. In contrast to previous systems, the focus of this system is on Spanish as spoken on the Iberian Peninsula as opposed to the Americas. We discuss the development of a Castilian Spanish broadcast-news corpus suitable for training the various system components of the MMS and report o...
متن کاملSpanish dialects: phonetic transcription
It is well known that canonical Spanish, the dialectal variant ‘central’ of Spain, so called Castilian, can be transcribed by rules. This paper deals with the automatic grapheme to phoneme transcription rules in several Spanish dialects from Latin America. Spanish is a language spoken by more than 300 million people, has an important geographical dispersion compared among other languages and ha...
متن کامل